Spatial-Temporal Data Mining in Wireless Sensor Networks

 

V. K. Patle

School of Studies in Computer Science and I.T., Pt. Ravishankar Shukla University Raipur C.G.

*Corresponding Author E-mail: patlevinod@gmail.com

 

ABSTRACT:

Data mining techniques are use to discover a meaningful knowledge from data set. These mining techniques have been applied to intrusion detection, customized marketing, web personalization and many real-life problems. Knowledge discovery from sensor data is an emerging research area because of its variety of applications are presents for our society. Wireless Sensor Networks (WSN), produce large scale of data in the form of streams. Spatiotemporal Mining in the sensor data provides useful information for different applications.

 

KEYWORDS: WSN, Spatial-Temporal Data Mining.

 


1    INTRODUCTION:

Sensor networks are found in increasing number of applications in many areas, including battle fields, smart buildings, and even the human body. Most sensor networks consist of a collection of light-weight (possibly mobile) sensors connected via wireless links to each other or to a more powerful gateway node that is in turn connected with an external network through either wired or wireless connections. Sensor nodes usually communicate in a peer-to-peer architecture over an asynchronous network. In many applications, sensors are deployed in hostile and difficult to access locations with constraints on weight, power supply, and cost. Moreover, sensors must process a continuous (possibly fast) stream of data. Data mining in wireless sensor networks (WSNs) is a challenging area, as algorithms need to work in extremely demanding and constrained environment of sensor networks (such as  bounded energy, storage, bandwidth, and computational power). WSNs also require highly decentralized algorithms [1, 2].

 

Development of algorithms that take into consideration the characteristics of sensor networks, such as energy and computation constraints, network dynamics, and faults, constitute an area of current research. Some work has been done in developing localized, collaborative, and distributed and self-configuration mechanisms in sensor networks.

In designing algorithms for sensor networks, it is imperative to keep in mind that power consumption has to be minimized. Even gathering the distributed sensor data in a single site could be expensive in terms of battery power consumed, some attempts have been made towards balancing the energy-quality trade-offs and making the data collection task energy efficient. An important optimization problem is clustering the nodes of the sensor networks. Nodes can easily communicate with each other that are clustered together, which can be worked in energy optimization and developing optimal algorithms for clustering sensor nodes. Some other works in this field include finding frequent item sets, identification of rare events or anomalies, and data preprocessing in sensor networks [2].

 

After a short introduction of data mining, this paper presents spatial and temporal data mining techniques with reference to wireless sensor networks including recently related works done in this area of research.

 

2.     DATA MINING DEFINITION:

Data mining is a powerful technology with great potential to help academics or industries to focus on and only the most important information in their data warehouse. Two primary goals of data mining tend to be prediction and description. Prediction involves to predict unknown or future values of other variables of interest using some variables (or) fields in the data set. On the other hand description emphasis on finding patterns and  describing the data that can be interpreted by humans. In fig[1] we present a short architecture of data mining process. At first data set from whom data to be mined filtered useful data by domain specific wrapper and then only these useful data may be mined by spatial-temporal mining engine and finally resultant data is available for the end users [3, 4].

 

 

Fig.1 Data Mining Architecture         

 

 

3.     TEMPORAL DATA MINING:

Temporal data mining [4, 5] related to  the analysis of events ordered by one or more dimensions of time. We differentiate between two broad directions. One related to the discovery of causal relationships among temporally-oriented events. The other related to the discovery of similar patterns within the same time sequence or among different time sequences. This latter area, commonly known as time series analysis (or trend analysis) focuses on the identification of similar pre-specified patterns.

 

3.1 Mining of Temporal Sequences:

The goal of temporal data mining is to discover hidden relations between sequences and subsequences of events. The representation and modeling of the data sequence in a suitable form; the definition of similarity measures between sequences; and the application of models and representations to the actual mining problems are the tree steps that are involved in the discovery of relations between sequences of events. A sequence composed by a series of nominal symbols from a particular alphabet is usually called a temporal sequence and a sequence of continuous, real-valued elements, is known as a time series.

 

3.2 Temporal Sequences Representation:

3.2.1. Time-Domain Continuous Representations:

An easy approach to represent a sequence of real-valued elements (time series) is to use the initial elements, ordered by their instant of occurrence without any preprocessing. An alternate is to find a piecewise linear function that able to describe the entire initial sequence approximately. The objective is to acquire a representation amenable to the detection of significant changes in the sequence.

 

3.2.2. Transformation Based Representations:

The basic idea of Transformation Based Representations is to transform the initial sequences from time to another domain, and after that to use a point in this new domain to represent each original sequence. The Discrete Fourier Transform (DFT)  is used by one proposal to transform a sequence from the time domain to a point in the frequency domain

 

The Discrete Wavelet Transform (DWT) is used by an more recent approach to translate each sequence from the time domain into the time / frequency domain. The DWT decomposes the original sequence into different frequency components, without loosing the information about the instant of the elements occurrence. It is a linear transformation.

 

3.3 Temporal Data Mining Tasks:

Data mining has been used in a wide range of applications. Temporal data mining tasks may be grouped as follows:

(i) Prediction,

(ii) Classification,

(iii) Clustering,

(iv) Search and retrieval and

(v) Pattern discovery.

 

The task of time-series prediction has to do with forecasting (typically) future values of the time series based on its past samples. In order to do this, one needs to build a predictive model for the data.

 

4. SPATIAL DATA MINING:

The main difference between data mining in relational DBS and in spatial DBS is that attributes of the neighbors of some object of interest may have an influence on the object and therefore have to be considered as well. The explicit location and extension of spatial objects define implicit relations of spatial neighborhood which are used by spatial data mining algorithms[5].

 

4.1 Database Primitives for Spatial Data Mining:

The set of database primitives for mining in spatial databases which are sufficient to express most of the algorithms for spatial data mining and which can be efficiently supported by a DBMS.

 

4.2 Efficient DBMS Support:

Effective filters allow restricting the search to such neighborhood paths “leading away” from a starting object. Neighborhood indices materialize certain neighborhood graphs to support efficient processing of the database primitives by a DBMS.

 

5. SPATIAL-TEMPORAL DATA MINING:

Spatiotemporal data mining refers to the extraction of implicit knowledge, spatial- temporal relationships or other patterns not explicitly retained in spatial-temporal databases.

 

5.1 Spatialization and Temporalization of Data Mining Techniques:

Spatial-temporal data mining depicts the confluence of several fields including spatio-temporal databases, statistics, machine learning, information theory and geographic visualization. First of all, spatial and temporal relationships exist among spatial entities at various levels (scales). Both metric (such as distance) and nonmetric (such as topology, directions, shape, etc.) spatial relations,  and temporal relations (such as before or after) may be explicit or implicit in the geographic databases. Second, spatial and temporal dependency and heterogeneity are intrinsic characteristic of spatiotemporal databases. Third, scale effect in space and time is a challenging research issue in geographic analysis [6].

 

5.2 The Spatial-Temporal Data-Mining Process:

The data-mining process usually consists of three steps:

(1) pre-processing;

(2) modeling and validation; and

(3)post-processing

 

The data may need some cleaning and transformation according to some constraints imposed by some tools, algorithms, or users during the first phase. The second phase consists of choosing or building a model that better reflects the application behavior. And at last, the third step consists of using the model, evaluated and validated in the second step to effectively study the application behavior.

 

5.3. Spatial-Temporal Data Representation and Infrastructure:

In a review of temporal knowledge discovery, four broad categories of temporality within data are classified :

          Static

          Sequences

          Time stamped

          Fully temporal

 

6. WSN CHALLENGES:

There are various challenges faced by WSNs [2, 7]. Some of them are stated below:

 

6.1 Real-Time:

Sometime it is necessary to deliver the data within given time or deadline. Not all the protocol developed for WSNs provide real time requirements. So developing real time protocol is a challenge for WSNs.

 

6.2 Power/ Energy Management:

A large amount of energy is consumed during communication among the nodes. Sensor should not deplete with battery for monitoring the critical areas. So, multiple sensors should be deployed in such areas instead of using single sensor.

 

6.3 Coverage Problems:

One of the fundamental issues that arise in sensor networks, in addition to location calculation, tracking, and deployment, is coverage. Coverage is subject to a wide range of interpretations [8], due to the large variety of sensors and applications,.

 

6.4 Security:

To achieve security in sensor network, security must be integrated in every single component of the system. One of the main challenges is how to secure a wireless network from eavesdropping [9].

 

6.5 Anomaly:

Sensor node gathers data and there is high possibility of corruption of that data. The main focus of this survey is on this challenge.

 

7. RELATED WORK:

Data Mining is an essential step in the knowledge discovery process that is concerned with extracting hidden knowledge from vast amounts of data using techniques inspired by different disciplines, such as databases, machine learning, artificial intelligence, and statistics [10]. Recently, data mining techniques have been used to extract patterns about data collected from a WSN. These kinds of patterns are usually used to gain insight into the phenomena under monitoring. These patterns can be also used to improve the performance of the network.

 

In [11], authors have given an overview of the wireless sensor networks and their challenges. One of challenge that is anomaly detection being a recent area of research as used for mining the sensor data is surveyed. Various types of the anomalies that can present in wireless sensor network are briefly explained to provide an overview. they also introduce the architecture used for anomaly detection and brief introduction to techniques.

 

Another paper [12] proposes possible adaptive methodology like ART model and PCA technique [13] to mine data in large sensor networks. Author presented that the structure of the processing architecture of a sensor network must be taken into account for data mining task. Data clustering algorithms for data spread over a sensor network are necessary in many applications based on sensor-networks. The use of limited resources together with the distributed nature of the sensor networks demands a fundamentally distributed algorithmic solution for data clustering. According to the sensing task like classification or prediction the organization of the sensor network may change, thus the accuracy and quality of the data mining task must be taken in to account.

 

In [14] author studied that geographical spatial-temporal correlations are evaluated respectively with the methods of geostatistics interpolation, wavelet data decomposition, fuzzy c-means clustering, and Apriori-based logical rules extraction. They only consider a geographical transaction as a composition of features of space, time, air temperature, precipitation, and vegetation, or as a five-dimensional geographical object.

 

In [15] paper, authors proposed a framework for spatiotemporal knowledge discovery that supports the development of new kinds of knowledge such as the spatiotemporal moving pattern. They discuss that proposed framework is possible to represent the definition and relationships of spatiotemporal data sets and knowledge by using a foundation model for knowledge discovery. Authors evaluate the characteristics of the proposed framework and present some of the related problems.

In paper [16], authors presented a comparative study of classification techniques, J48(Decision Tree), Naive Bayes, and ZeroR,  with labeled data in wireless sensor network. Data was obtained from Labelled Wireless Sensor Network Data Repository (LWSNDR). The data consisted of humidity and temperature measurements collected during 6 hour period at interval of 5 seconds. Label ‘0’ denotes normal data and label ‘1’ denotes an introduced event.  The performance of these three techniques was tried to show in terms of Summary of accuracy, Classifier Error, Confusion Matrix. By the experiments, it was found that Naďve Bayes algorithm is more suitable for the used dataset to reduce the data transmission in WSN effectively and to implement classification simply. It was concluded that by classifying the large dataset at the sensor nodes level, normal values can be discarded and transmit only the anomaly values to the central server.

 

8. CONCLUSION AND FUTURE WORK:

For Spatio-Temporal data sets, Spatio-Temporal data mining is necessary to extract knowledge and information. By applying different algorithms for different data mining techniques, we can choose one suitable algorithm for given dataset to extract knowledge. These knowledge can be used for different purposes such as transmission of data can be reduced by sending only required data to the central server.

 

In future we can apply mining on WSN data to increase life of sensors by getting specific patterns in the temporal sequences and geographical data.

 

9. ACKNOWLEDGMENTS:

The author is thankful to University Grant Commission, New Delhi, India for Minor Research Project. The author is also thankful to Pt. Ravishankar Shukla University, Raipur India for their resources, hosting and necessary support.

 

10. REFERENCES:

[1]   Z. Obradovic, D. Das, V. Radosavljevic, K. Ristovski, S. Vucetic, “Spatio-Temporal Characterization of Aerosols Through Active Use of Data from Multiple Sensors”, ISPRS TC VII Symposium, Vienna, Austria, July 5–7, 2010, IAPRS, vol. xxxviii, part 7b, pp424-429.

[2]   Jiawei Han and Jing Gao, “Research Challenges for Data Mining in Science and Engineering”, University of Illinois at Urbana-Champaign.

[3]   R. Geetha, N Sumathi and Dr. S. Sathiyabama, “A survey of spatial, temporal and spatio temporal data mining”, journal of computer applications, vol – 1, no.4, Oct – Dec 2008, pp 31-33.

[4]   S. P. Deshpande1 and V. M. Thakare, “Data Mining System and Applications: A Review”, International Journal of Distributed and Parallel Systems (IJDPS) vol.1, no.1, September 2010.

[5]   K. Venkateswara Rao, A. Govardhan and K.V. Chalapati Rao, “Spatiotemporal Data Mining: Issues, Tasks and Applications”, International Journal of Computer Science and Engineering Survey (IJCSES) Vol.3, No.1, February 2012.

[6]   Xiaobai Yao, “Research Issues in Spatio-temporal Data Mining”,University Consortium for Geographic Information Science (UCGIS) workshop on Geospatial Visualization and Knowledge Discovery, Lansdowne, Virginia, Nov. 18-20, 2003.

[7]   Shivanajay Marwaha, Jadwiga Indulska, Marius Portmann, “Challenges and Recent Advances in QoS Provisioning in Wireless Mesh Networks”, School of Information Technology and Electrical Engineering, University of Queensland and National ICT Australia (NICTA) Queensland Research Laboratory (QRL), Brisbane, Australia, 978-1-4244-2358-3/2008 IEEE, pp 618-623.

[8]   Seapahn Meguerdichian, Farinaz Koushanfar, Miodrag Potkonjak and Mani B. Srivastava, “Coverage Problems in Wireless Ad-hoc Sensor Networks”, Computer Science Department, University of California, Los Angeles, Rockwell Science Center (RSC) and DARPA.

[9]   Zoran S. Bojkovic, Bojan M. Bakmaz, and Miodrag R. Bakmaz, “Security Issues in Wireless Sensor Networks”, International Journal of Communications Issue 1, Volume 2, 2008.

[10] J. Han and M. Kamber, Data Mining: Concepts and Techniques, second ed. Morgan Kaufmann Publishers, 2006.

[11] Gourav Sahni and  Sonia Sharma, “Study of Various Anomalies and Anomaly Detection Methodologies in Wireless Sensor Network”,  International Journal of Advanced Research in Computer Science and Software Engineering,  Volume 3, Issue 5, May 2013 ISSN: 2277 128X pp 700-703.

[12] Lambodar Jena, Ramakrushna Swain, Narendra K. Kamila, “Mining Wireless Sensor Network Data: an adaptive approach based on artificial neural networks algorithm”, IJCCT Vol.1 Issue 2, 3, 4; 2010 for International Conference [ACCTA-2010].pp 347-353

[13]         M. Birattari, G. Bontempi, and H. Bersini. Lazy learning meets the recursive least-squares algorithm. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors,NIPS 11, pages 375–381,Cambridge, 1999. MIT Press.

[14]         Hong Shua, Xinyan Zhu, Shangping Dai, “Mining Association Rules in Geographical Spatio-Temporal Data”, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B2. Beijing 2008.

[15]         Jun-Wook Lee, and Yong-Joon Lee, “A Knowledge Discovery Framework for Spatiotemporal Data Mining”, International Journal of Information Processing Systems, Vol.2, No.2, June 2006, pp-124 129.

[16]         Bhawana Parbat, R. K. Dhuware, “Comparative Study of Classification Techniques with Labeled Data in Wireless Sensor Network”, International Journal of Computer Applications (0975 – 8887), Volume 69– No.11, May 2013

 

 

 

 

Received on 17.03.2014 Modified on 18.04.2014

Accepted on 03.05.2014      ©A&V Publications All right reserved

Research J.  Science and Tech. 6(2): April- June 2014; Page 79-86